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?3 '. Abstract 



In this paper we study saturated fraetions of factorial designs un- 
der the perspective of Algebraic Statistics. We define a criterion to 
check whether a fraction is saturated or not with respect to a given 
model. The proposed criterion is based purely on combinatorial ob- 
jects. Our technique is particularly useful when several fractions are 

0> I needed. We also show how to generate random saturated fractions 

r^ ' with given projections, by applying the theory of Markov bases for 

t:j- I contingency tables. 

^D ' Keywords: Estimability; Linear model; Circuits; Graver basis; Univer- 

f^ ', sal Markov basis. 

^^ , 1 Introduction 

C^ . The search for minimal designs to estimate linear models is an active research 

area in the design of experiments. Given a model, saturated fractions have a 
minimum number of points to estimate all the parameters of a given model. 
As a consequence, all information is used to estimate the parameters and 
there is no degree of freedom to estimate the error term. Nevertheless, 
saturated fractions are of common use in sciences and engineering, and they 
become particularly useful for highly expensive experiments, or when time 
limitations impose the choice of the minimum possible number of design 
points. For general reference in the design of experiments, the reader can 
refer to [H] and [3], where the issue of saturated fractions is discussed. 



In this paper we characterize saturated fractions of a factorial design in 
terms of the circuits of their design matrices and we define a criterion to 
check whether a given fraction is saturated or not. This avoids computating 
the determinant of the corresponding design matrix. 

Our work falls within the framework of Algebraic statistics. The ap- 
plication of polynomial algebra to the design of experiments was originally 
presented in [TTj, but with a different point of view. The techniques used 
here are mainly based on the combinatorial and algebraic objects associated 
to the design matrix of the model, such as the circuits, the Graver basis and 
the Universal Grobner basis. These algebraic objects are different bases of 
the toric ideal of the design matrix, which are a special set of polynomials 
originally used in Statistics for the analysis contingency tables. All these 
bases are used to solve enumeration problems, to make non- asymptotic in- 
ference, and to describe the geometric structure of the statistical models for 
discrete data. A recent account of this theory can be found in |10j . 

In this paper, we benefit from the interplay between algebraic techniques 
for the analysis of contingency tables and some topics of the design of ex- 
periments. We identify a factorial design with a binary contingency table 
whose entries are the indicator function of the fraction, i.e., they are equal 
to 1 for the fraction points and for the other points. This implies that 
a fraction can also be considered as a subset of cells of the table. Some 
recent results in this direction can be found in [T]. The connections between 
experimental designs and contingency tables have also been explored in |11] . 
but were limited to the investigation of enumerative problems in the special 
cases of contingency tables arising from the Sudoku problems. 

The basic idea underlying our theory is as follows. A fraction is saturated 
if a null linear combination of its points with non-zero coefficients exists. 
This vector of coefficients belongs to the kernel of the transpose of the design 
matrix. IN algebraic language this means generating bases of the toric ideal 
associated to the design matrix. Thus, each null linear combination with 
integer coefficients translates into a binomial of the toric ideal of the design 
matrix. In this paper we prove that if we consider a special basis of the 
toric ideal, namely the circuit basis, this is also a sufficient condition. Our 
approach based on circuits avoids the computation of the determinant of 
the design matrix, and therefore it avoids possible numerical problems. It 
is particularly useful when we need to find several saturated fractions for 
the same model as in the case of algorithms for optimal design generation. 
Furthermore, the circuits can be computed once and for all from the design 
matrix of the full factorial model and do not depend on the fraction. 

The paper is organized along these lines. In Section [2] we set some 



notations and we state the problem. In Section [3] we provide the basic 
algebraic framework to be used in the paper. In Section U] we prove the 
main result, showing that the absence of circuits is a necessary and sufficient 
condition for obtaining a saturated fraction. In Section [5l for unimodular 
design matrices, we report results which highlight the relationship between 
the different bases of the relevant toric ideal, and we show that for several 
models the design matrix is unimodular. In Section [6] we provide some 
examples to demonstrate the practical applicability of our theory. In Section 
[7] we show how to generate a sample of saturated fractions, by applying the 
theory of Markov bases when we add constraints to the projections. Finally, 
in Section [8] we suggest some future research directions stimulated by the 
theory presented here. 

2 Saturated designs 

Let P be a full factorial design with d factors, Ai, . . . , A^ with si, . . . ,Sd 
levels respectively, P = {0, . . . , si — 1} x • • • x {0, . . . , s^ — 1}. We consider 
a linear model on V: 

Y = Xp + £, 

where Y is the response variable, X is the design matrix, (3 is the vector of 
model parameters, and e is a vector of random variables that represent the 
error terms. We denote by p the number of estimable parameters. 

For instance, in a two-factor design with the simple effect model, we have 
p = si + S2 — 1 and a possible design matrix is: 

X = (mo I ao I ... I as-,^2 | 6o I • • • I ^^2-2) , (1) 

where rriQ is a column vector of I's, oq, . . . , oisi-2 are the indicator vectors of 
the first (si — 1) levels of the factor Ai, and ho,. . . ,6s2-2 are the indicator 
vectors of the first (s2 — 1) levels of the factor ^2- 

A subset J-, or fraction, of a full design T>, with minimal cardinality 
#/" = p, that allows us to estimate the model parameters, is a saturated 
fraction or saturated design. By definition, the design matrix Xjr of a satu- 
rated design is a non-singular matrix with dimensions p x p. 

Example 2.1. Let us consider the 2^ design and the model with simple 
effects and 2-way interactions. This example is at the same time not trivial 
and easy to handle, so that we will use it as the running example in this 
paper. The design matrix X of the full design has rank equal to 11 and is 
reported in Figure[l\ As the matrix X has rank 11, we search for fractions 
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Figure 1: The design matrix X of the model in Example 12.11 

with 11 points. For instance the fraction 

J-i = {(0,0, 0,0), (0,0, 0,1), (0,0, 1,1), (0,1, 0,1), (0,1, 1,0), (1,0, 0,1), 
(1,0, 1,0), (1,1,0,0), (1,1,0,1), (1,1, 1,0), (1,1, 1,1)} 

is saturated while 

^2 = {(0,0, 0,0), (0,0, 0,1), (0,0, 1,1), (0,1, 0,0), (0,1, 1,0), (1,0, 0,1), 
(1,0, 1,0), (1,1,0,0), (1,1, 0,1), (1,1, 1,0), (1,1, 1,1)} 

is not. A direct computation shows that there are (-^-^) = 4, 368 fractions 
with 11 points: among them 3,008 are saturated, and the remaining 1,360 
are not. 



3 Designs and contingency tables with Algebraic 
Statistics 

As mentioned in the Introduction, we identify a factorial design with a con- 
tingency table whose entries are the indicator function of the fraction. In 
the previous example, J-i is also a 2^ table with 1 in the cells identified 
by the point coordinates of J^i and otherwise as in Table [TJ To avoid 
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Table 1: 4-way contingency tables N{J^i) and N{T2) from Example 12. li 



mismiderstandings, J- denotes the fraction, while N{T) denotes the corre- 
sponding binary table. In Table [T] the contingency table representations of 
the fractions J^i and J-2 from Example 12.11 are displayed. 

Such an identification leads us to use the Algebraic Statistics tools from 
both contingency tables and Design of Experiments theories. 

In order to give a complete account of our theory and to present our 
algorithm with full details, some definitions and a few basic facts concern- 
ing the algebraic representation of contingency tables and the combinatorial 
properties of some statistical models are reported. For the Algebraic Statis- 
tics notions, the reader can refer to [TO] or [TTj. For details on Algebraic 
notions, see [7] and [H], 

Given a contingency table with K cells, we consider the polynomial ring 
]R[x] = M.[xi, . . . , xk] of all polynomials with indeterminates xi, . . . , xk and 
real coefficients. The relevant polynomial ring has one indeterminate for 
each cell of the table (or equivalently, for each point of the design) . An ideal 
X in M\x\ is a subset of M[x] such that f + g & I for all f,g(zl and fg^X 
for all f ^I and for ah g ^M\x\. 

In Algebraic Statistics, a class of ideals is of special interest. Given a 
s X K non-negative matrix A with integer entries, the toric ideal defined by 
A is the binomial ideal 



Ia = { 



x^-x" : Aa = Ah\ 



where the monomials x" are written in vector notation x"" = x^ ■ ■ ■ x 



K ■ 



Thanks to the Hilbert's basis theorem, every ideal has a finite basis 
{/i, . . . , /„}: for all / € X there are polynomials gi, . . . ,gn € M[x] such that 
/ = Qifi + ■ ■ ■ + gnfn- The computation of a system of generators of an 
ideal is a non trivial task in Computer Algebra. An actual way to do that 
is to compute the reduced Grobner basis of the ideal. The computation of 
a Grobner basis depends on the term-order chosen in the polynomial ring 
M[x], but for a given term-order the reduced Grobner basis is unique and 
can be computed through symbolic software. 

Among all term-orders, the elimination term-order for a given indeter- 
minate, say xk, leads to the Grobner basis of the projection Elim(x/^;T) := 
T n R[xi, . . . ,xk-i\, just taking the Grobner basis of X and removing the 
polynomials involving xk- Applying this fact iteratively. Theorem 4 in |20| 
shows that the statistical counterpart of elimination of indeterminates is the 
definition of a statistical model for incomplete tables. 

As there are finitely many term-orders, there are finitely many Grobner 
bases. 

Definition 3.1. Let Xa he a toric ideal. The union of all reduced Grobner 
bases ofX^ is called the Universal Grobner basis Ua of Xa- 

The computation of the Universal Grobner basis is unfeasible for most 
ideals, but fortunately there are special algorithms for doing that in the case 
of toric ideals. 

There are now several computer systems for handling multivariate poly- 
nomials, see for instance [6] and [8], and all of them compute Grobner bases. 
Recently, also the R package mpoly has been implemented, see |13] . For toric 
ideals the fastest algorithms are implemented in 4ti2, see |26j . 

Together with the Universal Grobner basis, there are other combinatorial 
and polynomial objects derived from a nonnegative integer matrix A. 

Definition 3.2. A binomial f = x"" — x G Xa is primitive if there is no 
binomial g = x'^ — x'^ £ Xa, with g ^ f, such that c < a and d < b. The 
Graver basis GrA of A is the set of all primitive binomials in Xa- 

Definition 3.3. The support of a binomial f = x'^ — x^ is the set of indices 
i fii = 1, . . . , K) such that a{i) ^ or b{i) ^ 0. We denote the support of f 
with supp(/). 

Definition 3.4. An irreducible binomial f = x^ — x € Xa is a circuit 
if there is no other binomial g £ Xa such that supp(5f) C supp(/) and 
supp{g) 7^ supp(/). We denote the set of all circuits ofXA with Ca- 



It is easy to see that every circuit is primitive. Moreover, Theorem 1.1 
in [15] states that 

Ca^Ua^ Gta . (2) 

We will come back on such inclusions later in Section [5l 

Remark 3.1. The notion of Universal Markov basis in Statistics for contin- 
gency tables has been already introduced in fl7]/ and l2^ for the analysis of 
bounded tables. The circuits are shown to be relevant objects for finding all 
the supports of the probability distributions in the closure of an exponential 
family for finite sample spaces, see 123^ . 



4 Characterization of saturated designs 

We are now ready to state the main result, exploiting the connections be- 
tween the saturated fractions and the circuits of the design matrix. Recall 
that the support of a binomial f = x^ — x^ is the set of indices for which 
u > or V > 0. 

In the algebraic theory of toric ideals it is common to use the transpose 
of the design matrix X in place of the design matrix X. In order to simplify 
the notation, we denote hy A = X^ the transpose of the design matrix, and 
with a slight abuse of notation we call it design matrix. As a consequence, 
given J^ = {ii, . . . , ip}, Ajr is the submatrix of A obtained by selecting the 
columns of A according to J-". Note that each column of A identifies a design 
point, and therefore the definition of a set of column-indices is equivalent to 
the definition of the fraction with the corresponding design points. 

Theorem 4.1. Let A be a full-rank design matrix with dimensions p x K 
and let Ca = {fi, ■ ■ ■ , fi} be the set of its circuits. Given a set T of p 
column-indices of A, the submatrix Ajr is non-singular if and only if J^ does 
not contains any of the supports supp(/i), . . . , supp(/l). 

Proof. We prove that Ajr is singular if and only if there is a binomial / in 
Ca with supp(/) C T. 

"^": If Ajr is singular, then there is a null linear combination of its 
columns. As the entries of Ajr are nonnegative integers, the linear combi- 
nation has coefficients in Q, and hence there is a linear combination with 
coefficients in Z. 

Therefore, there exists a non-zero vector ?n, G Z^ with Aj^m = 0. Using 
the positive part of m, rn^ = max{?n,, 0}, and the negative part m~ = 



— iiiin(m,, 0), we decompose m = m~^ — m~ , and the binomial x™ — x™ 
belongs to I^^ the toric ideal associated to A jr. 

As Iajt can be computed from Ia with the elimination algorithm as 
described in Section O and in particular Iaj: = Elim(xi : i = 1, . . . ,K,i ^ 
T;Ia), l-Ajr is non-empty and there is a binomial in the Grobner basis Qa,ti 
where r is the elimination term order for the indeterminates {xi : i = 
1, . . . ,K,i ^ T\ By definition of Universal Grobner basis, this implies that 
there is a binomial g in IAa with support in T . 

If the binomial (7 is a circuit, the proof if complete. If not, there do exist 
a circuit h ^ Ca with supp(/i) C supp(5) and it is enough to choose such 
binomial h. 

"<^": Suppose that there is a circuit f G Ca with support in J^, i.e., 
^m _ ^m jg jj^ 2j^, Hence, Am = and projecting onto the subspace 
]R[xi : i S J-"] we obtain Ajmi = 0, i.e., Ajr is singular. D 

Theorem l4.1l replaces a linear algebra condition (namely the non-singularity 
of the design matrix) with a combinatorial property for checking weather a 
fraction is saturated or not. This technique highlight an interesting combi- 
natorial property of the saturated fractions, with a theoretical interest. It 
may be of limited practical interest when analyzing a single fraction, but it 
becomes useful when we need to study all the saturated fractions of a given 
factorial design. Note that the set of the circuits Ca of a design matrix can 
be computed once for all fractions. 

We are now ready to analyze and discuss some examples. To ease the 
presentation, the binomials are defined through their exponents, so that 
a nonnegative vector a — 6 is used in place of x" — x^. For instance the 
binomial xix|— X2X7X8 inM[xi, . . . ,X8] is written as (1, —1,2,0,0,0, —1, —1). 
This notation is the standard one in 4ti2, the software we use for our 
computations. 

Example 4.1. We consider again the 2^ design and the model with simple 
effects and 2-way interactions, already discussed in Example \2.1\ In less 
than one second, 4ti2 produces the list of all 140 circuits of the design 
matrix. Labeling the design points lexicographically, the 140 circuits can be 
divided into three classes, up to permutations of factors or levels: 

• 20 circuits of the form 

/i = (0,0,0,0,1,-1,-1,1,-1,1,1,-1,0,0,0,0) (3) 

with max |/i| = 1; 
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Table 2: The supports of the three circuits in Example 14.11 The bullet 
symbol denotes the cells in the support 



• J^O circuits of the form 

/2 = (1,-2,0,1,0,1,-1,0,0,1,-1,0,-1,0,2,-1) (4) 

with max I/2I = 2; 

• 80 circuits of the form 

fs = (1,0, -2, 1,0, -1, 1,0, -2, 1,3, -2, 1,0, -2, 1) (5) 

with max j/sj = 3. 

The supports of these circuits are displayed in Figure [3 Note that the 
support of /2 is contained in the fraction T2 of Example \2.1[ making that 
fraction non-saturated. 



Remark 4.1. Theorem \4-l\ allows us to identify saturated designs with 
the feasible solutions of an integer linear programming problem. Let Ca = 
{cij,i = 1, . . . , L, J = 1, . . . , K) be the L x K matrix, whose rows contain the 



values of the indicator functions of the supports of the circuits fi, . . . , fL, 
Cij = ifij / 0),i = l,...,L,j = l,...,K and Y = (yi,...,yx) be the 
K -dimensional column vector that contains the unknown values of the indi- 
cator function of the points of T . The vector Y must satisfy the following 
conditions: 

CaY < b, 

y^ e {0, 1} 

where b = (61, . . . ,6^) is the column vector defined by hi = #supp(/j),i = 
1, . . . , L, and Ik is the column vector of length K and whose entries are all 
equal to 1. 

5 Computational remarks 

We have introduced in Section [3] three different objects associated with the 
design matrix of a model, namely the circuits, the Graver basis and the 
Universal Grobner basis. In general strict inclusions among them holds, as 
stated in Eq. ([2]). Therefore, it is interesting to find models for which such 
three sets coincide. For instance, one may have the Graver basis theoretically 
determined, and in such case no further computations are needed. 
The basic definition for investigating this issue is the following one. 

Definition 5.1. A nonnegative integer matrix with rank p is unimodular if 
all its non-zero p x p minors are equal to ±1. A nonnegative integer matrix 
is totally unimodular if all its non-zero minors are equal to ±1. 

Of course, a totally unimodular matrix is unimodular. It follows imme- 
diately from Definition 15.11 that the entries of a totally unimodular matrix 
are and 1, that each submatrix of a totally unimodular matrix is again to- 
tally unimodular, and the transpose of a totally unimodular matrix is again 
totally unimodular. A couple of less intuitive properties are collected in the 
following proposition. 

Proposition 5.1. Let A be a — 1 matrix with dimensions p x K . 

1. If for each subset J of columns of A, there is a partition {Ji, J2} of J 
such that 

10 



then A is totally unimodular. In particular, if each row contains at 
most 2 non-zero entries, then A is totally unimodular. 

2. All matrices obtained by pivot operations on a totally unimodular ma- 
trix are totally unimodular. 

For the theory of totally unimodular matrices, the reader can refer to 
, Chapters 19 and 20. The following result can be found in [25], page 70. 

Proposition 5.2. If A is a unimodular matrix, then 

Ca=Ua = GrA . (6) 

In view of above properties, we study the first non-trivial model, i.e., 
the no-d-way interaction model for d factors. All other model matrices are 
submatrices of this matrix. For d = 2, we reduce to the independence model, 
and it is known that the design matrix of the independence model is totally 
unimodular for arbitrary si and S2- Indeed, an alternative parametrization 
of the no-2-way interaction for two factors uses the following design matrix 

X = {ao \ ... I a^i-i \ bo \ ... \ 653-2) , (7) 

in place of the design matrix in Eq. ([T|) discussed in Section [2j X satisfies 
the hypotheses of the first item in Proposition 15. li To move to higher 
dimensions, we need to work with Lawrence liftings. Given a matrix A, its 
Lawrence lifting is the block matrix defined by 

A{^) = (^ ',). (8) 

where / is the identity matrix and is a null matrix with suitable dimensions. 

As argued in [25], the Lawrence lifting of a totally unimodular matrix is 
totally unimodular as well, and in [15], Section 4.1 is proved the following 
fact: given a no-d-way interaction model for a si x • • • x s^ design with design 
matrix A, the no-{d + l)-interaction model for the si x • • • x s^^ x 2 is the 
Lawrence lifting A{A) of A. This property is derived with details in J16j . 

Combining all the facts in the discussion above, the following theorem 
is proved. 

Theorem 5.1. The design matrix of all models for asiXS2x2x---x2 
factorial designs is totally unimodular. 
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The simplest model with a non unimodular design matrix is the no-S- 
way interaction model for the 3x3x3 factorial design. Under the name 
of "transportation problem", this model is fully discussed in [25], page 150. 
However in this case the circuits and the Graver basis still coincide. The 
simplest model where circuits and Graver basis differ is the no-3-way inter- 
action model for the 3x3x4 factorial design. The results are presented in 
the next section. 

To compute the circuits and the Graver basis of our running example 
and of the examples in the next section, we used the commands circuits 
and graver in 4ti2, [26]. All computations were carried out in less than 2 
seconds on a standard PC. As usual in Computer Algebra, the computational 
complexity increases quickly as the number of indeterminates rises and it 
depends heavily on the degrees of freedom of the model. For instance, the 
design 2^ under the model with simple effects and 2-way interactions is a 
computationally unfeasible problem on a standard PC. 

Finally, notee that the elements of the bases (Graver or circuits) are the 
coefficients of the linear combination of fraction points to have zero as result, 
and therefore a singular matrix. Therefore, the elements of the bases with 
more than p values different from zero are not of interest of our aims. In 
our running examples, the circuits of the third kind like the circuit /s in Eq. 
([5|) can be excluded, and in the following examples this situation comes up 
in other few cases. 

6 Examples 

In this section we briefly describe the results of our computations for some 
classical models. 

• Design 2^; model with simple factors and 2-way and 3- way interactions. 

The saturated model has 26 points. Circuit basis and Graver basis 
are equal. They contain 3, 254 elements that can be divided into 12 
classes, up to permutations of factors or levels. All of them have 
support cardinality less than 26 (and the maximum cardinality is just 
26). 

• Design 2^; model with simple factors. 

The saturated model has 6 points. Also in this case circuit basis and 
Graver basis are equal. The circuits are 353, 616 elements that can be 
divided into 38 classes, up to permutations of factors or levels. There 
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are 259, 904 circuits with support cardinality equal to 7, and therefore 
the circuits to be checked in our algorithm are 93, 712 with support 
cardinality ranging from 4 to 6. 

• Design 2x3x4; model with simple factors and 2-way interactions. 

The saturated model has 18 points. Circuit basis and Graver basis 
are equal, and they contain 42 elements that can be divided into two 
classes. One class of 24 elements has support cardinality 12, and an- 
other class of 18 elements has support cardinality 8. All of them are 
needed for the algorithm. 

• Design 3x3x4; model with simple factors and 2-way interactions. 

The saturated model has 24 points. Circuit basis and Graver basis 
are different. The latter has 19, 722 elements that can be divided into 
20 classes, while the former has 17, 994 elements with 19 classes, all 
included in the Graver basis. Both bases contain 15, 302 elements with 
support cardinality smaller than or equal to 24. A configuration in the 
Graver basis which is not a circuit is: 

(-1,1,1-1,1,0,0,-1,0,-1,-1,2,0,-1,0,1,-1,2,-1, 

0,1,-1,1,-1,1,0,-1,0,0,-2,1,1,-1,2,0,-1); 

All the elements in the Graver basis but not in the circuit basis are 
permutations of this configuration. 

7 Generation of random saturated fractions 

When a sequence of saturated fractions is needed, instead of a single sat- 
urated fraction, an algorithm for finding such fractions without computing 
the determinant of the design matrix of each fraction may be useful. 

As a first application, one can generate random fractions with p points 
and then check whether they are saturated or not simply by comparing the 
selected fractions with the circuit basis. For instance, we have implemented 
that procedure for the model in our running example, and we are able to 
generate 5,000 random saturated fractions in about 1 second on a standard 
PC, by executing few lines of code. For our simulations, we used R, see [18]. 

However, it is interesting to take a deeper look into the connections 
between Markov bases and saturated fractions. Therefore, we show how to 
apply the theory of Markov bases to generate saturated fractions with given 
projections. Indeed, the combinatorial objects needed to check whether a 
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fraction is saturated or not are essentially the same as those needed to define 
a Markov Chain Monte Carlo sampler. 

Remember that we have already identified a fraction with a binary con- 
tingency table. Given a fraction T ^ the corresponding table is N{T\ where 
-^(-^)n,...,id = 1 if (h, . . . ,irf) is a point of T and -/V(-7^)ii,...,irf = otherwise. 
Using Algebraic statistics tools, we are able to generate all fractions with 
given margins, or projections, through a Markov chain algorithm following 
the theory in pT]. Moreover, we can merge that algorithm with our theory 
in order to select only the saturated fractions with given margins. Interest- 
ingly, the Algebraic and Combinatorial objects needed to define the relevant 
Markov chain for navigating the set of all fractions with given margins and 
for checking whether the fraction is saturated are very close. 

Basic facts about Markov bases are reported. The reader can refer to 
the book [10] for a complete presentation. A Markov move m is a table with 
integer entries such that N(J-), N(J^) + m and N(J^) — m have the same 
margins. A Markov basis A^ is a finite set of Markov moves which makes 
connected the set of all tables, or designs, with the same margins. Notice 
that by adding Markov moves to a Markov basis yields again a Markov basis. 

In standard problems involving contingency tables, Theorem 3.1 in [9] 
states that an actual way for writing a Markov basis of a design matrix A 
is as follows. Compute a Grobner basis of the toric ideal Ia (w.r.t. an 
arbitrary term-order) and then define the moves by taking the logarithms 
of the binomials {p"" — p^ ^ a — b) in the Grobner basis. 

As observed in [2T] and [22], when the cells of the table are bounded, we 
need a special Markov basis, namely the Universal Markov basis, derived 
from the Universal Grobner basis and taking logarithms. In our problem, 
the relevant tables are bounded as each entry can be only or 1. Therefore, 
in what follows we consider the moves of the Universal Markov basis. 

If we start from a fraction with matrix N(J^o), the Markov chain is then 
built as follows: 

• at each step i, randomly choose a Markov move m in A^ and a sign 
£€{±1}; 

• if N{Ti) + em is a binary table, move the chain to A^(J-i-(_i) = N{J^i) + 
em; otherwise, stay in N{Ti). 

The Markov chain described above is a connected chain over all the designs 
with fixed margins, and its stationary distribution is the uniform one. By 
considering the classical Metropolis-Hastings probability ratio, one can de- 
fine a Markov chain converging to any specified probability distribution, see 
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Now, a set of saturated fractions can be extracted from the Markov chain 
above, by comparing each fraction with the supports of the relevant circuits, 
as described in Section HI and discarding the non-saturated fractions. 

Two computational remarks are now in order: (a) the Universal Grobner 
basis coincide with the set of circuits, and the Universal Markov basis does 
not need new computations; (b) due to the limitation to — 1 tables, we can 
discard all the moves with values outside {—1,0, 1} as they do not produce 
valid tables in the algorithm above. 

Example 7.1. We now reconsider the 2^ design with simple effects and 2- 
way interactions, already illustrated in the previous sections. Starting from 
the fraction T in Example \2.1\ we are interested in saturated fractions with 
the same one-way projections. The Universal Grobner basis for this problem 
coincides with the circuits basis and consists o/ 1,348 elements, 532 of them 
have values in {—1,0, 1} and are needed to define the Markov chain. Taking 
such set of moves as input, we are able to produce a sequence of fractions 
with the same one-way projections as Ti. Comparing each fraction with 
the supports of the 120 circuits found in Example \2.1\ we have a random 
sample of saturated fractions. In less than 1 minute, the execution of a sim- 
ple R function yields a sample of 5, 000 saturated fractions with the desired 
projection. 

8 Conclusions 

The theory described in this paper suggests several extensions and applica- 
tions. Firstly, it is interesting to explore how the results can be extended 
for the characterization of saturated fractions to more general designs. 

Secondly, the connections between fractions and graphs need to be stud- 
ied. Indeed, it is known that the circuits for two-factor designs can be de- 
rived by the complete bipartite graph associated with the design, but little 
is known in the general case. 

It would also be interesting to study the classification of the saturated 
fractions with respect to some statistical criteria. Among these criteria, 
we cite the minimum aberration in a classical sense, or more recent tools, 
such as state polytopes. Minimum aberration is a classical notion in this 
framework, and is supported by a large amount of literature. This theory 
has been developed in [12] and more recently in, e.g., |27j with the use of 
the indicator function in the two-level case. The extension to the multilevel 
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case is currently an open problem. The use of state polytopes has been 
introduced in [5]. 

For applications, it would be interesting to define an algorithm to op- 
timize the search of saturated fractions using the circuit basis in the con- 
struction of the fractions instead of checking a random fraction. 

Finally, the use of the inequivalent saturated fractions to perform exact 
tests on model parameters is worth studying, together with its implementa- 
tion in statistical softwares, such as SAS or R. For inequivalent orthogonal 
arrays very interesting results have already been achieved, see [1] and [2]. 
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